\clearpage
\subsection{Vector SHA2 Acceleration - Per Round}
\label{sec:vector:sha2:per-round}

\begin{cryptoisa}
vsha2ms.vv  vrd, vrs1      // Update message states by 16 rounds.
vsha2wsi.vv vrt, vrs1, rnd // Update working states by 16 rounds.
\end{cryptoisa}

These instructions are used to accelerate the SHA-256 and SHA-512
hash functions.
The exact hash algorithm performed by the instructions is
polymorphic based on {\tt vtype.vsew}.
To perform SHA-256 functions, set $\SEW=256$.
To perform SHA-512 functions, set $\SEW=512$.
Executing these instructions with any other \SEW value will
result in an Invalid Opcode Exception.

The \mnemonic{vsha2ms.vv} instruction updates the current {\em message state}
stored in \vrs{1} with $16$ rounds of the SHA256/SHA512 hash function
as defined by \SEW.
The $16$ rounds are applied to each pair of $2*\SEW$
elements stored in \vrs{1}.
The result is then written back to \vrd as $2*\SEW$ elements.

The \mnemonic{vsha2wsi.vv} instruction updates the current {\em working state}
by $16$ rounds.
The $3$-bit {\tt rnd} immediate is used to identify the first of
the $16$ rounds to apply, and hence select appropriate round constants.
For SHA-256, valid {\tt rnd} values are $0, 16, 32$   and $48$.
For SHA-512, valid {\tt rnd} values are $0, 16, 32, 48$ and $64$.
Executing \mnemonic{vsha2wsi.vv} with
${\tt rnd}=64$ 
and
{\tt SEW}=$256$
will result in an Invalid Opcode Exception.

\todo{The vsha2ws.vv immediate requires $3$ bits but only needs to express
up to $5$ values. Recommend embedding the immediate in the encoding directly
to make the instructions require fewer encoding points.
They can still be written as above in assembly to avoid confusing
mnemonic names.}

To support \mnemonic{vsha2*} instructions with $\SEW=256$ (i.e. SHA256),
implementations must support $\ELEN \ge 512$.
To support \mnemonic{vsha2*} instructions with $\SEW=512$ (i.e. SHA512),
implementations must support $\ELEN \ge 1024$.
Executing these instructions with said parameters on an implementation
not meeting these criteria will cause an Invalid Opcode Exception.

\begin{figure}[h]
\begin{lstlisting}[language=pseudo]
TBD
\end{lstlisting}
\caption{Pseudocode for the round-based SHA2 vector instructions.}
\label{fig:pseudo:sha:per-round}
\end{figure}

\subsection{Vector SHA2 Acceleration - All Rounds}
\label{sec:vector:sha2:all-round}

\begin{cryptoisa}
vsha2hs.vv vrt, vrs1      // Update hash states (all rounds).
\end{cryptoisa}

This instruction is used to accelerate the SHA-256 and SHA-512
hash functions.
The exact hash algorithm performed by the instructions is
polymorphic based on {\tt vtype.vsew}.
To perform SHA-256 functions, set $\SEW=256$.
To perform SHA-512 functions, set $\SEW=512$.
Executing these instructions with any other \SEW value will
result in an Invalid Opcode Exception.

The \mnemonic{vsha2hs.vv} instruction performs all
$64$ (resp. $80$) rounds of the SHA-256 (resp. SHA-512) block function.
The current hash state is sourced from \vrt,
with $\EEW=\SEW$ and $\EMUL=\LMUL$.
The input message chunks are sourced from \vrs{1},
with $\EEW=2*\SEW$ and $\EMUL=2*\LMUL$.
Hence, each $2*\SEW$-wide message chunk element of \vrs{1} is added into each
\SEW-wide state element of \vrt.
Each\SEW-wide result element is then written back to \vrt.

\begin{figure}[h]
\begin{lstlisting}[language=pseudo]
TBD
\end{lstlisting}
\caption{Pseudocode for the all round vector SHA2 instructions.}
\label{fig:pseudo:sha:all-round}
\end{figure}
